Performance Metrics, Error Modeling, and Uncertainty Quantification

نویسندگان

  • YUDONG TIAN
  • GREY S. NEARING
  • CHRISTA D. PETERS-LIDARD
  • KENNETH W. HARRISON
  • LING TANG
چکیده

A common set of statistical metrics has been used to summarize the performance ofmodels ormeasurements— the most widely used ones being bias, mean square error, and linear correlation coefficient. They assume linear, additive, Gaussian errors, and they are interdependent, incomplete, and incapable of directly quantifying uncertainty. The authors demonstrate that these metrics can be directly derived from the parameters of the simple linear errormodel. Since a correct errormodel captures the full error information, it is argued that the specification of a parametric error model should be an alternative to the metrics-based approach. The error-modeling methodology is applicable to both linear and nonlinear errors, while themetrics are onlymeaningful for linear errors. In addition, the error model expresses the error structure more naturally, and directly quantifies uncertainty. This argument is further explained by highlighting the intrinsic connections between the performancemetrics, the error model, and the joint distribution between the data and the reference. 1. Limitations of performance metrics One of the primary objectives of measurement or model evaluation is to quantify the uncertainty in the data. This is because the uncertainty directly determines the information content of the data (e.g., Jaynes 2003), and dictates our rational use of the information, be it for data assimilation, hypothesis testing, or decision-making. Further, by appropriately quantifying the uncertainty, one gains insight into the error characteristics of the measurement or model, especially via efforts to separate the systematic error and random error (e.g., Barnston and Thomas 1983; Ebert and McBride 2000). Currently the common practice of measurement or model verification is to compute a common set of performance metrics. These performance metrics are statistical measures to summarize the similarity and difference between two datasets. These metrics are based on direct comparison of datum pairs on their corresponding spatial/ temporal location. The most commonly used ones are bias, mean square error (MSE), and correlation coefficient (CC) (e.g., Fisher 1958; Wilks 2011), but many Corresponding author address: Yudong Tian, NASA Goddard Space Flight Center, Mail Code 617, Greenbelt, MD 20771-5808. E-mail: [email protected] FEBRUARY 2016 T I AN ET AL . 607 DOI: 10.1175/MWR-D-15-0087.1 2016 American Meteorological Society variants or derivatives, such as (un)conditional bias (e.g., Stewart 1990), MSE (Murphy and Winkler 1987), unbiased root-mean-square error (ubRMSE), anomaly correlation coefficient, coefficient of determination (CoD), and skill score (SS; e.g., Murphy and Epstein 1989) also fall into this category. Table 1 lists some of these metrics and their definitions. Among them, the ‘‘big three’’—bias, MSE, and CC— are the most widely used in diverse disciplines, exemplified by the popular ‘‘Taylor diagram’’ (Taylor 2001). These metrics do, however, have several limitations: 1) Interdependence. Most of these conventional performance metrics are not independent; they have been demonstrated to relate to each other in complex ways. For example, the MSE can be decomposed in many ways to link it with other metrics, such as bias and correlation coefficient (e.g.,Murphy 1988; Barnston 1992; Taylor 2001; Gupta et al. 2009; Entekhabi et al. 2010). These relations indicate both redundancy among these metrics, and the metrics’ indirect connection to independent error characteristics. This leads to ambiguity in the interpretation and intercomparison of these metrics. 2) Underdetermination. It is easy to verify that these metrics do not describe unique error characteristics, even when many of them are used collectively. In fact, many different combinations of error characteristics can produce the same values of these metrics. This is illustrated in Fig. 1. Amonthly time series of land surface temperature anomaly data, extracted from satellite-based observations (Wan et al. 2004) over a location in the United States (358N, 958W), is used as the reference (black curves) for validating two separate hypothetical sets of predictions (Figs. 1a and 1b, blue curves). Their respective scatterplots are also shown (Figs. 1c and 1d), with values of fivemajor conventional metrics listed (bias, MSE, CC, CoD, and SS). When seen from either the time series plots or the scatterplots, the two measurements exhibit apparently very different error characteristics. However, all the metrics, except bias, give nearly identical values (Figs. 1c and 1d). In fact, there is an infinite number of ways to construct measurements that can produce identical values for many of the metrics. Therefore, when given a set of these metrics values, one will have fundamental difficulty in inferring and communicating the error characteristics of the predictions. 3) Incompleteness. There are no well-accepted guidelines on how many of these metrics are sufficient. Many inexperienced users follow a ‘‘the more the better’’ philosophy, and it is not rare to see works TABLE 1. Examples of conventional performance metrics.* The observations and forecasts are denoted as x and y, respectively. Name Definition Ideal value

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Uncertainty Quantification and Error Estimation in Scramjet Simulation

The numerical prediction of scramjet in-flight performance is a landmark example in which current simulation capability is overwhelmed by abundant uncertainty and error. The aim of this work is to develop a decision-making tool for balancing the available computational resources in order to equally reduce the effects of all sources of uncertainty and error below a confidence threshold. To that ...

متن کامل

Uncertainty Quantification in Computational Structural Dynamics: a New Paradigm for Model Validation

*Structural Dynamics Dept., Org. 9234 †Aerosciences Dept., Org. 9115 ‡Statistics and Human Factors Dept., Org. 12323 **Statistics and Human Factors Dept., Org. 12323 ABSTRACT. We present an overview of new research efforts underway at Sandia National Laboratories to understand the sources of uncertainty and error in computational structural dynamics and other physics simulations, and to quantif...

متن کامل

Performance comparison of land change modeling techniques for land use projection of arid watersheds

The change of land use/land cover has been known as an imperative force in environmental alteration, especially in arid and semi-arid areas. This research was mainly aimed to assess the validity of two major types of land change modeling techniques via a three dimensional approach in Birjand urban watershed located in an arid climatic region of Iran. Thus, a Markovian approach based on two suit...

متن کامل

Enhanced Trajectory Based Similarity Prediction with Uncertainty Quantification

Today, data driven prognostics acquires historic data to generate degradation path and estimate the Remaining Useful Life (RUL) of a system. A successful methodology, Trajectory Similarity Based Prediction (TSBP) that details the process of predicting the system RUL and evaluating the performance metrics of the estimate was proposed in 2008. Two essential components of TSBP identified for poten...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016